Introduction to Machine Learning in Python with Scikit-learn

This workshop will teach you how to perform machine learning for prediction in Python using the widely-used Scikit-learn package. You will be introduced to best practices for machine learning model creation and selection, including data splitting, pre-processing, parameter and model optimization, as well as results visualization and communication. Workshop examples will begin with simple, intuitive models (e.g., K-nearest neighbors, linear regression) but also demonstrate the use of more commonly used and industry standard models (e.g., L1 Regularized regression and Light Gradient Boosting Machines). The workshop will focus on demonstrating how to do this using the modern Scikit-learn pipeline syntax.

This course is for you if you:

are comfortable using Python and the pandas and package to read, transform and reshape data
have experience making a variety of graphs with any Python package

Intermediate or expert familiarity with modeling or machine learning is not required.

Course Links

Notes

Pandas describe can be useful for getting descriptive summary statistics across an entire dataset: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html
Prediction error plots across a range of values: https://www.scikit-yb.org/en/latest/api/regressor/peplot.html